{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Deep Learning Tutorial for NLP with Tensorflow\n",
    "This tutorial borrows from here and tries to show how to work with NLP tasks using Tensorflow. Borrowed material from [here]([here](https://github.com/rguthrie3/DeepLearningForNLPInPytorch/blob/master/Deep%20Learning%20for%20Natural%20Language%20Processing%20with%20Pytorch.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "import os\n",
    "import numpy as np\n",
    "\n",
    "tf.set_random_seed(1) # seed to obtain similar outputs\n",
    "os.environ['CUDA_VISIBLE_DEVICES'] = '' # avoids using GPU for this session"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. Introduction of Tensors in Tensorflow\n",
    "How to create and handle tensors, which are the building blocks for any deep learning architecture you wish to implement. You can find more information about tensors and why they are used for deep learning modeling here [LINK]. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Creating Tensors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tensor(\"Const:0\", shape=(3,), dtype=float32) \n",
      " [ 1.  2.  3.]\n",
      "Tensor(\"Const_1:0\", shape=(2, 3), dtype=float32) \n",
      " [[ 1.  2.  3.]\n",
      " [ 4.  5.  6.]]\n",
      "Tensor(\"Const_2:0\", shape=(2, 2, 3), dtype=float32) \n",
      " [[[  1.   2.   3.]\n",
      "  [  4.   5.   6.]]\n",
      "\n",
      " [[  7.   8.   9.]\n",
      "  [ 10.  11.  12.]]]\n"
     ]
    }
   ],
   "source": [
    "# start session\n",
    "sess = tf.Session()\n",
    "\n",
    "#initialize_op =  tf.global_variables_initializer()\n",
    "#sess.run(initialize_op)\n",
    "\n",
    "# 1D tensor (also known as a 1D vector)\n",
    "V_data = [1., 2., 3.]\n",
    "V = tf.constant(V_data, tf.float32)\n",
    "print(V, \"\\n\",sess.run(V))\n",
    "\n",
    "# 2D tensor (also known as a 2D matrix)\n",
    "M_data = [[1.,2.,3.],[4.,5.,6.]]\n",
    "M = tf.constant(M_data, tf.float32)\n",
    "print(M,\"\\n\", sess.run(M))\n",
    "\n",
    "# 3D tensor of size 2x2x2\n",
    "T_data = [[[1.,2., 3.], [4.,5., 6.]],\n",
    "          [[7.,8., 9.], [10.,11., 12]]]\n",
    "T = tf.constant(T_data, tf.float32)\n",
    "print(T, \"\\n\", sess.run(T))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also operate on tensors the same way we would on standard numpy matrices. The following show some examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tensor(\"Const:0\", shape=(3,), dtype=float32) \n",
      " 1.0\n",
      "Tensor(\"Const_1:0\", shape=(2, 3), dtype=float32) \n",
      " [ 1.  2.  3.]\n",
      "Tensor(\"Const_2:0\", shape=(2, 2, 3), dtype=float32) \n",
      " [[ 1.  2.  3.]\n",
      " [ 4.  5.  6.]]\n"
     ]
    }
   ],
   "source": [
    "# Index into V and produce a scalar\n",
    "print(V, \"\\n\",sess.run(V[0]))\n",
    "\n",
    "# Index into M and produce a scalar\n",
    "print(M, \"\\n\",sess.run(M[0]))\n",
    "\n",
    "# Index into T and produce a matrix\n",
    "print(T, \"\\n\",sess.run(T[0]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also create a tensor with random data with a specified dimension"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Tensor(\"random_normal:0\", shape=(3, 4, 5), dtype=float32) \n",
      " [[[ 2.92425132  0.1904023  -0.0999928   0.54333866  0.44673878]\n",
      "  [-0.74975967 -0.54365152 -0.11328146 -0.73304367  0.84540504]\n",
      "  [-0.98038179  0.22330996  0.00350116  0.63260406  0.53027982]\n",
      "  [ 0.11028062 -1.06750703  0.78615493  1.87881851  0.29188421]]\n",
      "\n",
      " [[ 0.65026432 -0.20335354 -0.80023402 -0.09808423 -0.38593856]\n",
      "  [-0.82919121  0.76829058 -0.30424473  0.76561087  0.09717353]\n",
      "  [-0.03512896 -0.44248524 -0.63942212 -1.37797427  0.00825686]\n",
      "  [-0.61289674 -2.01852942  0.82595825 -0.49025506  2.95250511]]\n",
      "\n",
      " [[-1.63209784  1.47359169  0.18774952  2.14593959 -2.32931423]\n",
      "  [-1.32003772  0.49694589 -0.96136755 -0.05908597  0.28159368]\n",
      "  [-3.31978583  0.13563539  1.1548022  -0.09598015  0.69415414]\n",
      "  [-1.30712473 -0.62319744  0.53103876  0.84003556 -0.92114925]]]\n"
     ]
    }
   ],
   "source": [
    "x = tf.random_normal([3,4,5])\n",
    "print(x, \"\\n\", sess.run(x))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Operations with Tensors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 5.  7.  9.]\n"
     ]
    }
   ],
   "source": [
    "x = tf.constant([1.,2.,3.])\n",
    "y = tf.constant([4.,5.,6.])\n",
    "z = x+y\n",
    "print(sess.run(z))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Concatenation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 0.58697659  0.64898354 -0.00182672  0.46972319 -0.41258761]\n",
      " [ 0.50580209  1.08674705 -0.98539138 -0.37500492 -0.27378666]\n",
      " [-0.10245068 -0.42229936  0.53483063 -0.90609199  0.2286306 ]\n",
      " [ 1.33724499  0.54341757 -0.11454915 -0.13848655  0.70253915]\n",
      " [ 0.18164249 -0.91935599  1.01541901 -0.56042647  0.84225392]]\n",
      "\n",
      "\n",
      "[[ 1.01857746 -0.81870914 -0.97899282 -0.23221652  0.54886436 -1.43723702\n",
      "   1.37461388  0.19729844]\n",
      " [-2.54092383  0.40189755 -0.17309178 -0.0217983   1.60369706  0.98966628\n",
      "  -1.32113194 -0.79742384]]\n"
     ]
    }
   ],
   "source": [
    "# concatenate by rows (axis=0)\n",
    "x_1 = tf.random_normal([2,5])\n",
    "y_1 = tf.random_normal([3,5])\n",
    "z_1 = tf.concat([x_1, y_1], 0)\n",
    "print(sess.run(z_1))\n",
    "\n",
    "print(\"\\n\")\n",
    "# concatenate by columns (axis=1)\n",
    "x_2 = tf.random_normal([2,3])\n",
    "y_2 = tf.random_normal([2,5])\n",
    "z_2 = tf.concat([x_2, y_2], 1)\n",
    "print(sess.run(z_2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "More on concatenation [here](https://www.tensorflow.org/api_docs/python/tf/concat)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reshaping tensors"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[[ 0.20525731 -0.89691323  1.07025433 -0.15564451]\n",
      "  [-0.40598848 -0.29619998 -0.82002103 -0.37928221]\n",
      "  [ 0.12490512  0.03255307  1.95663142 -1.24699843]]\n",
      "\n",
      " [[ 0.42128563 -0.4631182   0.95954072 -1.19090807]\n",
      "  [-1.57321525  0.02937024  1.77990592 -0.65396672]\n",
      "  [-1.10802829  0.33289853 -1.41283214 -0.69903141]]]\n",
      "\n",
      "\n",
      "[[-1.53472745 -0.57235247 -1.88776898 -0.37644315  1.04152322  0.10515906\n",
      "  -0.41811749 -1.1674329   0.40431187  0.34636515 -0.1303246   2.76180029]\n",
      " [-1.75117695 -0.15002298  0.65409285  0.44413102 -0.12226854  0.90484196\n",
      "   0.21590623  1.56579196 -0.41805771 -0.28057054 -0.06562074 -0.55228049]]\n",
      "\n",
      "\n",
      "[[-0.51017123  0.04450967 -0.04642046  0.04758257 -2.67784333 -2.01859713\n",
      "  -2.0040257  -0.24659356  1.00061321  0.10434803 -0.38498423  0.78364938]\n",
      " [ 1.9038918  -1.16755736 -0.9858014   0.01895356  0.82711017  0.77959973\n",
      "  -1.15610087  0.94547653  0.65504938 -0.99168533  0.3092854  -0.84525639]]\n"
     ]
    }
   ],
   "source": [
    "x = tf.random_normal([2,3,4])\n",
    "print(sess.run(x))\n",
    "print(\"\\n\")\n",
    "print(sess.run(tf.reshape(x,[2,12])))\n",
    "print(\"\\n\")\n",
    "print(sess.run(tf.reshape(x,[2,-1]))) # -1 enable automatic infer of dimension"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Noticed that everything time we do sess.run(x) the values of the tensor keeps changing. We can fix this as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[[ 1.58181369  0.81615424  1.2647419   2.44118857]\n",
      "  [-0.64325184 -0.52695727  0.87262011 -2.33419275]\n",
      "  [-0.91412055 -0.65119022 -0.2877526  -0.03355991]]\n",
      "\n",
      " [[ 0.03930664 -0.3554261  -1.84036934  0.64453197]\n",
      "  [-0.70999211 -0.07287408  0.58443463  0.53111529]\n",
      "  [-1.32720828  0.61744171 -1.12603593  1.55679595]]]\n",
      "\n",
      "\n",
      "[[ 1.58181369  0.81615424  1.2647419   2.44118857 -0.64325184 -0.52695727\n",
      "   0.87262011 -2.33419275 -0.91412055 -0.65119022 -0.2877526  -0.03355991]\n",
      " [ 0.03930664 -0.3554261  -1.84036934  0.64453197 -0.70999211 -0.07287408\n",
      "   0.58443463  0.53111529 -1.32720828  0.61744171 -1.12603593  1.55679595]]\n"
     ]
    }
   ],
   "source": [
    "x = tf.random_normal([2,3,4])\n",
    "x_result = sess.run(x) # this makes the results stable\n",
    "print(x_result)\n",
    "print(\"\\n\")\n",
    "print(sess.run(tf.reshape(x_result,[2,-1])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Computation Graphs and Automatic Differentiation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[-0.31353098 -0.52566063]\n",
      " [-2.63175964 -2.74863577]]\n"
     ]
    }
   ],
   "source": [
    "# slightly different from pytorch but still the same effect applies\n",
    "x = tf.random_normal([2,2])\n",
    "y = tf.random_normal([2,2])\n",
    "z = x+y\n",
    "print(sess.run(z))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "# we specify a session to run the evaluation\n",
    "with sess.as_default():\n",
    "    var_x = tf.Variable(x, name=\"X\")\n",
    "    var_y = tf.Variable(y, name=\"Y\")\n",
    "    var_z = var_x + var_y\n",
    "\n",
    "    var_x.initializer.run()\n",
    "    var_y.initializer.run()\n",
    "\n",
    "    var_z.eval()\n",
    "    print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<tf.Tensor 'gradients/add_2_grad/Reshape:0' shape=(2, 2) dtype=float32>]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tf.gradients(var_z, var_x) # compute gradients of var_z with regards to var_x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Deep Learning Building Blocks: Affine maps, non-linearities and objectives\n",
    "The affine map is a function $f(x)$ where\n",
    "$$ f(x) = Wx + b $$\n",
    "\n",
    "where $W$ refers to a weight matrix and vectors $x,b$ are input and bias, respectively. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Tensor 'add_3:0' shape=(128, 30) dtype=float32>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# We do this slightly different in tensorflow using matmul operation\n",
    "X = tf.Variable(tf.random_normal([128,20]), name=\"X\")\n",
    "W = tf.Variable(tf.random_normal([20,30]), name=\"W\")\n",
    "b = tf.Variable(tf.random_normal([30]), name=\"b\")\n",
    "\n",
    "tf.matmul(X, W) + b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Non-linearities\n",
    "Examples: $tanh(x)$, $\\sigma(x)$, and $ReLU(x)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[ 1.61521125  0.01385026]\n",
      " [ 0.04062277 -0.31989396]]\n",
      "[[ 1.61521125  0.01385026]\n",
      " [ 0.04062277  0.        ]]\n"
     ]
    }
   ],
   "source": [
    "data = tf.Variable(tf.random_normal([2,2]), name=\"data\")\n",
    "data.initializer.run(session=sess)\n",
    "print(sess.run(data))\n",
    "print(sess.run(tf.nn.relu(data)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Softmax and Probabilities\n",
    "We are computing for $softmax(x)$ where the $i^{th}$ component of $softmax(x)$ is:\n",
    "$$ \\frac{\\exp(x_i)}{\\sum_j \\exp(x_j)} $$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 1.60825002 -2.16805053]\n",
      "[ 0.97760576  0.02239429]\n",
      "1.0\n",
      "[-0.0226488  -3.79894924]\n"
     ]
    }
   ],
   "source": [
    "data = tf.Variable(tf.random_normal([2]), name=\"data\")\n",
    "data.initializer.run(session=sess)\n",
    "print(sess.run(data))\n",
    "print(sess.run(tf.nn.softmax(data)))\n",
    "print(sess.run(tf.reduce_sum(tf.nn.softmax(data))))\n",
    "print(sess.run(tf.log(tf.nn.softmax(data))))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4. Optimization and Training\n",
    "We compute for gradients that are used to update our weights so as to minimize the loss function as the objective: \n",
    "\n",
    "$$ \\theta^{(t+1)} = \\theta^{(t)} - \\eta \\nabla_\\theta L(\\theta) $$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5. Creating Network Components in Tensorflow\n",
    "#### Logistic regression on BOW classifier\n",
    "Given a BOW vector representation $x$ of a sentence $s$. The output of our network is computed as follows:\n",
    "$$log(softmax(Wx+b))$$\n",
    "where \n",
    "$$\\forall i\\in \\{|s|\\}: x_i  = count(s_i)$$ and $|s|$ is the number of words in a given sentence"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'me': 0, 'gusta': 1, 'comer': 2, 'en': 3, 'la': 4, 'cafeteria': 5, 'Give': 6, 'it': 7, 'to': 8, 'No': 9, 'creo': 10, 'que': 11, 'sea': 12, 'una': 13, 'buena': 14, 'idea': 15, 'is': 16, 'not': 17, 'a': 18, 'good': 19, 'get': 20, 'lost': 21, 'at': 22, 'Yo': 23, 'si': 24, 'on': 25}\n"
     ]
    }
   ],
   "source": [
    "data = [ (\"me gusta comer en la cafeteria\".split(), \"SPANISH\"),\n",
    "         (\"Give it to me\".split(), \"ENGLISH\"),\n",
    "         (\"No creo que sea una buena idea\".split(), \"SPANISH\"),\n",
    "         (\"No it is not a good idea to get lost at sea\".split(), \"ENGLISH\") ]\n",
    "\n",
    "test_data = [ (\"Yo creo que si\".split(), \"SPANISH\"),\n",
    "              (\"it is lost on me\".split(), \"ENGLISH\")]\n",
    "\n",
    "# defining the training set labels or targets\n",
    "label_to_ix = { \"SPANISH\": [1,0], \"ENGLISH\": [0,1] }\n",
    "\n",
    "# word_to_ix maps each word in the vocab to a unique integer, which will be its\n",
    "# index into the Bag of words vector\n",
    "word_to_ix = {}\n",
    "for sent, _ in data + test_data:\n",
    "    for word in sent:\n",
    "        if word not in word_to_ix:\n",
    "            word_to_ix[word] = len(word_to_ix)\n",
    "print (word_to_ix)\n",
    "\n",
    "VOCAB_SIZE = len(word_to_ix)\n",
    "NUM_LABELS = 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "class BOWClassifier(object):\n",
    "    def __init__(self, num_labels, vocab_size):\n",
    "        # input\n",
    "        self.X = tf.placeholder(tf.float32, [None, vocab_size])\n",
    "        self.Y = tf.placeholder(tf.float32, [None, num_labels])\n",
    "        \n",
    "        # weights and bias\n",
    "        self.W = tf.Variable(tf.random_uniform([vocab_size, num_labels]))\n",
    "        self.b = tf.Variable(tf.random_uniform([num_labels]))\n",
    "        \n",
    "        # log probabilties of output\n",
    "        self.logits = tf.matmul(self.X, self.W) + self.b\n",
    "        self.log_probs = tf.log(tf.nn.softmax(self.logits))\n",
    "        \n",
    "        # loss\n",
    "        self.loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=self.logits, labels=self.Y))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "def make_bow_vector(sentence, word_to_ix):\n",
    "    vec = np.zeros(len(word_to_ix)) # need to find what is the way to assign to tf tensors\n",
    "    for word in sentence:\n",
    "        vec[word_to_ix[word]] += 1\n",
    "    return vec.reshape(1, -1)\n",
    "\n",
    "def make_target(label, label_to_ix):\n",
    "    # need to binarize the labels => [0,1] or [1,0]\n",
    "    return np.array([label_to_ix[label]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "My observation is that with Pytorch it is easy to create variables and assign values to them. With tensorflow this is not so easy, therefore I preferred to use numpy to assist with the make_bow_vector function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = BOWClassifier(NUM_LABELS, VOCAB_SIZE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<tf.Variable 'Variable:0' shape=(26, 2) dtype=float32_ref>"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# parameters can be obtained directly\n",
    "model.W"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's just obtain the log probabilities of the output"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "with tf.Session() as sess:    \n",
    "    # initialize and run all variables so that we can use their values directly\n",
    "    init = tf.global_variables_initializer()\n",
    "    sess.run(init)\n",
    "    \n",
    "     # prepare bow vector\n",
    "    sample = data[0]\n",
    "    bow_vector = make_bow_vector(sample[0], word_to_ix)\n",
    "    \n",
    "    log_probs = sess.run(model.log_probs, feed_dict = {model.X: bow_vector})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[-0.12336739 -2.15363789]]\n"
     ]
    }
   ],
   "source": [
    "print(log_probs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's print the matrix parameters corresponding to as specific word such as \"creo\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[-0.28383601 -1.39792216]]\n",
      "[[-1.90972412 -0.16031106]]\n"
     ]
    }
   ],
   "source": [
    "with tf.Session() as sess:    \n",
    "    # initialize and run all variables so that we can use their values directly\n",
    "    init = tf.global_variables_initializer()\n",
    "    sess.run(init)\n",
    "    \n",
    "    for instance, label in test_data:\n",
    "        bow_vector = make_bow_vector(instance, word_to_ix)\n",
    "        params, log_probs = sess.run([model.W, model.log_probs], feed_dict = {model.X: bow_vector})\n",
    "        print(log_probs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 0.69216204  0.48723626]\n"
     ]
    }
   ],
   "source": [
    "print(params.T[:,word_to_ix[\"creo\"]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here we train the model and use optimization to minimize lost. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.918552\n",
      "0.297093\n",
      "0.17745\n",
      "0.128839\n",
      "0.101833\n",
      "0.08441\n",
      "0.0721463\n",
      "0.0630087\n",
      "0.0559212\n",
      "0.0502563\n",
      "0.0456219\n",
      "0.0417586\n",
      "0.0384884\n",
      "0.0356845\n",
      "0.0332538\n",
      "Optimization finished!!!\n",
      "[[-0.15573406 -1.93646169]]\n",
      "[[-3.43429089 -0.03277975]]\n",
      "[ 0.88358289  0.29581532]\n"
     ]
    }
   ],
   "source": [
    "with tf.Session() as sess:    \n",
    "    # training procedure\n",
    "    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)\n",
    "    train_op = optimizer.minimize(model.loss)\n",
    "    \n",
    "    # initialize and run all variables so that we can use their values directly\n",
    "    init = tf.global_variables_initializer()\n",
    "    sess.run(init)\n",
    "    \n",
    "    # train cycle with training data\n",
    "    for epoch in range(15):\n",
    "        for instance, label in data:\n",
    "            bow_vector = make_bow_vector(instance, word_to_ix)\n",
    "            target = make_target(label, label_to_ix)\n",
    "            _, loss, params, log_probs = sess.run([train_op, model.loss, model.W, model.log_probs], \n",
    "                                         feed_dict = {model.X: bow_vector, model.Y:target})\n",
    "        print(loss)\n",
    "    print(\"Optimization finished!!!\")\n",
    "    \n",
    "    # test with testing data\n",
    "    for instance, label in test_data:\n",
    "        bow_vec = make_bow_vector(instance, word_to_ix)\n",
    "        params, log_probs = sess.run([model.W, model.log_probs],\n",
    "                                    feed_dict={model.X: bow_vec})\n",
    "        print(log_probs)\n",
    "    print(params.T[:,word_to_ix[\"creo\"]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can observe that the log probability for Spanish is much higher in the first test sample, while the log probability for English is much higher in the second example, as it corresponds the testing data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 0.90241164 -0.08874875]\n"
     ]
    }
   ],
   "source": [
    "print(params.T[:,word_to_ix[\"gusta\"]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6. Word Embeddings: Enconding Lexical Semantics\n",
    "We aim to build dense representations of a vocabulary to obtain semantic similarity between words, which helps to support the distributional hypothesis (words appearing in similar contexts are related to each other semantically).\n",
    "\n",
    "Each vector representation of a word contains some semantics attributes that determine similar words through some similarity measure like cosine similarity.\n",
    "\n",
    "These latent semantic attributes (features) are determined or learned automatically by a neural network. In other words, the parameters of the model are the word embeddings, which are learned during training. The attributes learned cannot be interpreted since they are learned by the neural network during the training.\n",
    "\n",
    "\"In summary, word embeddings are a representation of the semantics of a word, efficiently encoding semantic information that might be relevant to the task at hand. You can embed other things too: part of speech tags, parse trees, anything! The idea of feature embeddings is central to the field.\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Word Embeddings in Tensorflow\n",
    "First, we build a lookup table which keeps indexes. The results is a $|V|\\times D$ matrix, where $|V|$ is the size of the vocabulary and $D$ is the dimensionality of the embeddings. This means that a word with index $i$ has its embedding stored in the $i$th row of the matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "sess = tf.Session()\n",
    "word_to_ix = { \"hello\": 0, \"world\": 1 }\n",
    "word_ids = [0,1]\n",
    "embeds = tf.Variable(tf.random_uniform([2,5]), name=\"word_embeddings\")\n",
    "embedded_word_ids = tf.nn.embedding_lookup(embeds, word_ids)\n",
    "\n",
    "#print(sess.run(embedded_word_ids))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example: N-gram Language Modeling \n",
    "In this model we aim to predict target word from word sequence; i.e., we aim to compute:\n",
    "$$ P(w_i | w_{i-1}, w_{i-2}, \\dots, w_{i-n+1} ) $$\n",
    ", where $w_i$ is the ith word of the sequence. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[(['When', 'forty'], 'winters'), (['forty', 'winters'], 'shall'), (['winters', 'shall'], 'besiege')]\n"
     ]
    }
   ],
   "source": [
    "CONTEXT_SIZE = 2\n",
    "EMBEDDING_DIM = 10\n",
    "# We will use Shakespeare Sonnet 2\n",
    "test_sentence = \"\"\"When forty winters shall besiege thy brow,\n",
    "And dig deep trenches in thy beauty's field,\n",
    "Thy youth's proud livery so gazed on now,\n",
    "Will be a totter'd weed of small worth held:\n",
    "Then being asked, where all thy beauty lies,\n",
    "Where all the treasure of thy lusty days;\n",
    "To say, within thine own deep sunken eyes,\n",
    "Were an all-eating shame, and thriftless praise.\n",
    "How much more praise deserv'd thy beauty's use,\n",
    "If thou couldst answer 'This fair child of mine\n",
    "Shall sum my count, and make my old excuse,'\n",
    "Proving his beauty by succession thine!\n",
    "This were to be new made when thou art old,\n",
    "And see thy blood warm when thou feel'st it cold.\"\"\".split()\n",
    "# we should tokenize the input, but we will ignore that for now\n",
    "# build a list of tuples.  Each tuple is ([ word_i-2, word_i-1 ], target word)\n",
    "trigrams = [ ([test_sentence[i], test_sentence[i+1]], test_sentence[i+2]) for i in range(len(test_sentence) - 2) ]\n",
    "print (trigrams[:3]) # print the first 3, just so you can see what they look like"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "vocab = set(test_sentence)\n",
    "word_to_ix = { word: i for i, word in enumerate(vocab) }"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [],
   "source": [
    "class NGramLanguageModeler(object):\n",
    "    def __init__(self, vocab_size, embedding_dim, context_size):\n",
    "        \n",
    "        # inputs and outputs\n",
    "        self.X = tf.placeholder(tf.int32, [ context_size ]) # context (word sequence)\n",
    "        self.Y = tf.placeholder(tf.int32, [1,1]) # target word \n",
    "        \n",
    "        # Embeddings\n",
    "        self.embeddings = tf.Variable(tf.random_uniform([vocab_size, embedding_dim], -1.0, 1.0))\n",
    "        self.embed = tf.nn.embedding_lookup(self.embeddings, self.X)\n",
    "        \n",
    "        # layer\n",
    "        self.W = tf.Variable(tf.random_uniform([vocab_size, embedding_dim]))\n",
    "        self.b = tf.Variable(tf.random_uniform([vocab_size]))\n",
    "        \n",
    "        # loss        \n",
    "        self.loss = tf.reduce_mean(tf.nn.nce_loss(weights=self.W,\n",
    "                     biases=self.b,\n",
    "                     labels=self.Y,\n",
    "                     inputs=self.embed,\n",
    "                     num_sampled=64,\n",
    "                     num_classes=vocab_size))        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "7355.27872467\n",
      "3092.70562935\n",
      "1670.26257944\n",
      "1151.23467731\n",
      "904.935578346\n",
      "789.814936161\n",
      "719.891431808\n",
      "671.3481884\n",
      "645.542729855\n",
      "625.694592476\n"
     ]
    }
   ],
   "source": [
    "# create model with class\n",
    "model = NGramLanguageModeler(len(vocab), EMBEDDING_DIM, CONTEXT_SIZE)\n",
    "losses = []\n",
    "\n",
    "with tf.Session() as sess:    \n",
    "    # training procedures\n",
    "    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)\n",
    "    train_op = optimizer.minimize(model.loss)\n",
    "        \n",
    "    # initialize and run all variables so that we can use their values directly\n",
    "    init = tf.global_variables_initializer()\n",
    "    sess.run(init)\n",
    "    \n",
    "    # train cycle with training data\n",
    "    for epoch in range(10):\n",
    "        total_loss = 0\n",
    "        for context, target in trigrams: # collecting batches\n",
    "            context_idxs = list(map(lambda w: word_to_ix[w], context))\n",
    "            \n",
    "            _, embed,loss, params = sess.run([train_op, model.embed, model.loss, model.W],\n",
    "                                                 feed_dict = {model.X:context_idxs,\n",
    "                                                             model.Y:np.array(word_to_ix[target]).reshape(1,1)})\n",
    "            total_loss+=loss\n",
    "        print(total_loss)\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### TODO: To improve the graph structure by adding namescopes. We can also visualize embeddings using Tensorboard by loggin summaries."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Computing Word Embeddings: Continuous Bag-of-Words\n",
    "The Continuous Bag-of-Words model (CBOW) is frequently used in NLP deep learning. It is a model that tries to predict words given the context of a few words before and a few words after the target word. This is distinct from language modeling, since CBOW is not sequential and does not have to be probabilistic. Typcially, CBOW is used to quickly train word embeddings, and these embeddings are used to initialize the embeddings of some more complicated model. Usually, this is referred to as pretraining embeddings. It almost always helps performance a couple of percent.\n",
    "\n",
    "The CBOW model is as follows.  Given a target word $w_i$ and an $N$ context window on each side, $w_{i-1}, \\dots, w_{i-N}$ and $w_{i+1}, \\dots, w_{i+N}$, referring to all context words collectively as $C$, CBOW tries to minimize,\n",
    "$$ -\\log p(w_i | C) = \\log \\text{Softmax}(A(\\sum_{w \\in C} q_w) + b) $$,\n",
    "where $q_w$ is the embedding for word $w$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[(['We', 'are', 'to', 'study'], 'about'), (['are', 'about', 'study', 'the'], 'to'), (['about', 'to', 'the', 'idea'], 'study'), (['to', 'study', 'idea', 'of'], 'the'), (['study', 'the', 'of', 'a'], 'idea')]\n"
     ]
    }
   ],
   "source": [
    "CONTEXT_SIZE = 2 # 2 words to the left, 2 to the right\n",
    "raw_text = \"\"\"We are about to study the idea of a computational process. Computational processes are abstract\n",
    "beings that inhabit computers. As they evolve, processes manipulate other abstract\n",
    "things called data. The evolution of a process is directed by a pattern of rules\n",
    "called a program. People create programs to direct processes. In effect,\n",
    "we conjure the spirits of the computer with our spells.\"\"\".split()\n",
    "word_to_ix = { word: i for i, word in enumerate(set(raw_text)) }\n",
    "data = []\n",
    "for i in range(2, len(raw_text) - 2):\n",
    "    context = [ raw_text[i-2], raw_text[i-1], raw_text[i+1], raw_text[i+2] ]\n",
    "    target = raw_text[i]\n",
    "    data.append( (context, target) )\n",
    "print (data[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Building Model with Tensorflow"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [],
   "source": [
    "class CBOW(object):\n",
    "    def __init__(self, vocab_size, embedding_dim, context_size):\n",
    "        \n",
    "        # inputs and outputs\n",
    "        self.X = tf.placeholder(tf.int32, [ context_size * 2 ]) # context (word sequence)\n",
    "        self.Y = tf.placeholder(tf.int32, [1,1]) # target word \n",
    "        \n",
    "        # Embeddings\n",
    "        self.embeddings = tf.Variable(tf.random_uniform([vocab_size, embedding_dim], -1.0, 1.0))\n",
    "        self.embed = tf.nn.embedding_lookup(self.embeddings, self.X)\n",
    "        \n",
    "        # layer\n",
    "        self.W = tf.Variable(tf.random_uniform([vocab_size, embedding_dim]))\n",
    "        self.b = tf.Variable(tf.random_uniform([vocab_size]))\n",
    "        \n",
    "        # loss        \n",
    "        self.loss = tf.reduce_mean(tf.nn.nce_loss(weights=self.W,\n",
    "                     biases=self.b,\n",
    "                     labels=self.Y,\n",
    "                     inputs=self.embed,\n",
    "                     num_sampled=64,\n",
    "                     num_classes=vocab_size))  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3483.83104134\n",
      "1645.26916409\n",
      "980.017921448\n",
      "688.383241653\n",
      "546.793506622\n",
      "466.670639515\n",
      "417.156669617\n",
      "386.634778023\n",
      "369.295153141\n",
      "351.957611561\n"
     ]
    }
   ],
   "source": [
    "# create model with class\n",
    "model = CBOW(len(vocab), EMBEDDING_DIM, CONTEXT_SIZE)\n",
    "losses = []\n",
    "\n",
    "with tf.Session() as sess:    \n",
    "    # training procedures\n",
    "    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)\n",
    "    train_op = optimizer.minimize(model.loss)\n",
    "        \n",
    "    # initialize and run all variables so that we can use their values directly\n",
    "    init = tf.global_variables_initializer()\n",
    "    sess.run(init)\n",
    "    \n",
    "    # train cycle with training data\n",
    "    for epoch in range(10):\n",
    "        total_loss = 0\n",
    "        for context, target in data: # collecting batches\n",
    "            context_idxs = list(map(lambda w: word_to_ix[w], context))\n",
    "            \n",
    "            _, embed,loss, params = sess.run([train_op, model.embed, model.loss, model.W],\n",
    "                                                 feed_dict = {model.X:context_idxs,\n",
    "                                                             model.Y:np.array(word_to_ix[target]).reshape(1,1)})\n",
    "            total_loss+=loss\n",
    "        print(total_loss)\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### References\n",
    "- [How to structure your model in Tensorflow](http://web.stanford.edu/class/cs20si/lectures/notes_04.pdf)\n",
    "- [Understanding RNNs and LSTMs](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)\n",
    "- [Introduction to RNN](http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/)\n",
    "- [RNN Text Classification](https://github.com/LunaBlack/RNN-Classification)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
